Universal Access To All Knowledge
Home donate | Forums | FAQs | Contributions | Terms, Privacy, & Copyright | Contact | Volunteer Positions | Jobs | Bios
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | Go Back
View Post [edit]

Poster: Tom Gally Date: December 11, 2009 06:20:16pm
Forum: texts Subject: Missing images from Google book scans

Nearly every day, I check the following two URLs to see what books have been added to the Americana text collection, and nearly every day I come across unusual, interesting, and wonderful books.

http://www.archive.org/search.php?query=%28collection%3Aamericana%20AND%20format%3Apdf%20AND%20mediatype%3Atexts%29%20AND%20-mediatype%3Acollection&;sort=-publicdate

http://www.archive.org/search.php?query=-description%3A%28Google%29%20AND%20%28collection%3Aamericana%20AND%20format%3Apdf%20AND%20mediatype%3Atexts%29%20AND%20-mediatype%3Acollection&;sort=-publicdate

The first URL lists all of the most recent additions to the Americana collection, while the second URL excludes those taken from Google (nearly all added, it seems, by user tpb, whom I presume is a 'bot). Even though it returns many fewer books, I prefer the latter URL because, too often, the scans from Google are bad in multiple ways--folded-over pages, visible fingers, resolution too coarse for the text to be read, and, worst of all, omitted images.

Examples of missing images, all from books taken from the former URL a few minutes ago, can be seen here:

http://www.archive.org/stream/economicmininga01lockgoog#page/n410/mode/2up
http://www.archive.org/stream/edinburghphilos10edingoog#page/n381/mode/2up
http://www.archive.org/stream/earthadescripti01reclgoog#page/n34/mode/2up

Presumably Google removed these images in order to improve the accuracy of OCR conversion, but it's a shame that these files are being added in such large numbers to the Internet Archive when, one would hope, better scans must be available somewhere.

Does anyone know why these defective versions, rather than versions with the illustrations intact, are being added? Do the libraries at which the books were scanned (Harvard University, University of California, etc.) know that defective versions of their books are being added to the Internet Archive? Can anything be done to replace those scans with better ones?

Reply to this post
Reply [edit]

Poster: stringybark Date: December 11, 2009 07:15:40pm
Forum: texts Subject: Re: Missing images from Google book scans

In my experience, the Google digitisation is done with quantity rather than quality in mind. Eventually I concluded that they have an 'acceptable level of defect' policy where, providing only around 1 percent of pages are spoiled, the quality standards are met. So all "user tpb" books have a few dud pages. The contrast in quality is stark compared to the older, far better quality, MSN sponsored digitisation.

Reply to this post
Reply [edit]

Poster: Time Traveller Date: December 11, 2009 10:13:36pm
Forum: texts Subject: Re: Missing images from Google book scans

seeing they might be losing the rights to lots of books soon, it might be that the more they have on line today, the more rights they have to keep these books.

But they are sort of profit making, while volunteers do the scanning for the IA, a labour of love as opposed to a salary from Google.

Peter

Reply to this post
Reply [edit]

Poster: stbalbach Date: December 11, 2009 10:09:17pm
Forum: texts Subject: Re: Missing images from Google book scans

Concur with stringybark. Also there's been a problem with public-domain books disappearing from Google's full-view access. Presumably Google wants to sell those books in the future once it resolves the legalities, although there may be other reasons. User tpb is doing a great service by getting books off Google ASAP while it's still possible, before they disappear behind limited-preview mode (ie. pay wall), even if they are poor quality, it's better than nothing!

Stephen

Reply to this post
Reply [edit]

Poster: Time Traveller Date: December 11, 2009 11:10:10pm
Forum: texts Subject: Re: Missing images from Google book scans

The arrangement that Google has, where it owns out of copyright books or something until authors opt out is causing lots of discussion and legal action.

It may be that with it all in the news media, authors are opting out, becuase they did not know before, how Google was using their book (s)

Peter

Terms of Use (10 Mar 2001)